Explorative data visualization#
2023.05.09
In this tutorial, we will show how to use tools in the scientific Python ecosystem for interactive and explorative data visualization.
Goals#
Interactively examine a satellite image using rioxarray and hvplot.
Interactively examine an ipyleaflet map with data editing
Goal 1 procedure#
First you need to install and import rioxarray and hvplot’s xarray module if you intend to run this notebook on the Callysto Hub.
!pip install rioxarray hvplot
import rioxarray
import hvplot.xarray
Note
rioxarray is an experimental package that extends xarray’s capability by adding the rasterio data accessor.
hvplot#
By Holoviz contributors, BSD-3-Clause license
Hvplot is a high-level plotting API in the Python ecosystem. It works with many popular data libraries such as (geo)pandas and xarray (which we’ll see in this and the next hand-on topics) and visualize data using various plotting libraries based on user’s choice.
Load the data#
Let’s use this Sentinel-2 image hosted on the AWS open data repository:
image_url = 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/51/R/UH/2023/3/S2B_51RUH_20230311_0_L2A/TCI.tif'
The full asset associated with this true-color image (TCI) can be accessed using this STAC browser. You can see that the true-color image has a label called COG, which stands for cloud-optimized geotiff.
Now we use rioxarray to fetch the data without downloading it:
# da stands for data array
da = rioxarray.open_rasterio(image_url)
da
<xarray.DataArray (band: 3, y: 10980, x: 10980)>
[361681200 values with dtype=uint8]
Coordinates:
* band (band) int64 1 2 3
* x (x) float64 3e+05 3e+05 3e+05 ... 4.098e+05 4.098e+05 4.098e+05
* y (y) float64 2.8e+06 2.8e+06 2.8e+06 ... 2.69e+06 2.69e+06
spatial_ref int64 0
Attributes:
AREA_OR_POINT: Area
OVR_RESAMPLING_ALG: AVERAGE
_FillValue: 0
scale_factor: 1.0
add_offset: 0.0This gridded file is now represented using the xarray data structure. For more geospatial information, you can call the rasterio accessor’s attributes and functions, such as:
print(f"Grid width: {da.rio.width} pixels")
print(f"Grid height: {da.rio.height} pixels")
print(f"Geospatial transform (m): \n{da.rio.transform()}")
Grid width: 10980 pixels
Grid height: 10980 pixels
Geospatial transform (m):
| 10.00, 0.00, 300000.00|
| 0.00,-10.00, 2800020.00|
| 0.00, 0.00, 1.00|
You probably won’t feel you have downloaded any data so far, despite that you have seen the metadata shown above. This is why a cloud-optimized geotiff is powerful for cloud computing – we will only download the minimum necessary part of data. By the way, this TCI.tif file has a size of 276 MB (and you can check that by a manual download), so downloading the full dataset would have taken tens of seconds or so.
We will talk more about cloud-optimized geotiffs during the next hands-on topic (open data sets in remote sensing) and the corresponding lectures.
Interactive visualization#
This raster image has ~120 million pixels, and it would take a lot of time for any visualization tools to read them all and render it on your screen. I have commented out this line, but feel free to try it if you don’t mind waiting:
# da.hvplot.image(x='x', y='y', data_aspect=1)
I have examined the data and decided to have a smaller slice for furthur visualization:
# Here x and y slices are based on the UTM coordinates.
da_subset = da.sel(x=slice(320000, 330000), y=slice(2780000, 2770000))
Now it’s time to visualize the cropped data using hvplot:
da_subset.hvplot.image(x='x', y='y', data_aspect=1)
Note that you can switch the band view by sliding the bar in the right.
The rectangular blocks in the image are buildings and runways at the Taoyuan airport. You can better recognize it on the True color image:
da_subset.hvplot.rgb(x='x', y='y', data_aspect=1, bands='band')
Some questions for you!#
Can you invert the colormap so bright colors represent high reflectance, which will be easier for readers to intepret?
By reading the hvplot gridded data tutorial, can you make a line plot with three lines, each representing the DN value of a specific band along the x coordinate at a fixed y coordinate (say
y = 2774000)? Like this:What band number is Red? How about Green and Blue? How do you know that?
Show code cell source
# Check here for question 1! https://holoviews.org/user_guide/Colormaps.html
da_subset.sel(y=slice(2774000, 2773960)).hvplot.line(x='x', by='band', groupby='y')
Goal 2 procedure#
TBD